Skip to content

fix(rivetkit-core): cap SerializeState shutdown timeout at 15s#4742

Merged
NathanFlurry merged 1 commit intomainfrom
sleep-cleanup/02-serialize-state-timeout-15s
Apr 27, 2026
Merged

fix(rivetkit-core): cap SerializeState shutdown timeout at 15s#4742
NathanFlurry merged 1 commit intomainfrom
sleep-cleanup/02-serialize-state-timeout-15s

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 24, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Code Review: fix(rivetkit-core): cap SerializeState shutdown timeout at 15s

Overview

This is a minimal, focused change that reduces the SERIALIZE_STATE_SHUTDOWN_SANITY_CAP constant in rivetkit-rust/packages/rivetkit-core/src/actor/task.rs from 30 seconds to 15 seconds. This constant caps the maximum time the actor shutdown path will wait for a SerializeState reply before giving up and proceeding with an empty delta set.


Analysis

Correctness

The constant is used in save_final_state() as a deadline on the reply_rx oneshot receiver for a SerializeState { reason: Save } event sent during shutdown. Reducing from 30s to 15s is a conservative and reasonable cap — 15s is still generous for a serialization callback, and a faster cap means actors fail-faster on a hung user serializeState callback instead of blocking the shutdown pipeline for half a minute.

The timeout arm returns Vec::new() (empty deltas) and logs an error, which means a timed-out serialization still proceeds to save_state rather than hard-failing. The behavior on timeout is unchanged; only the wait window narrows.

Code Quality

  • The change is exactly what the title describes: one-line, one constant, no collateral drift.
  • The constant name (SANITY_CAP) correctly signals this is a guard against pathological slowness, not a configured SLA.

Concerns / Suggestions

  • Missing rationale in PR description. The PR template is unfilled. It would be helpful to know what motivated the reduction (e.g., observed shutdown hangs in production, a new shutdown-pipeline deadline that this must fit within, alignment with a documented timeout budget). Without that context it is hard to verify the value is correct vs. arbitrary.
  • No corresponding doc/limit update. Per .claude/reference/docs-sync.md, changes to actor timeouts should be reflected in website/src/content/docs/actors/limits.mdx. If this timeout is user-visible or documented, that file should be updated.
  • Consider making this configurable. If actor serializeState callbacks legitimately vary in cost (e.g., large in-memory state), 15s could be too tight for some workloads. An ActorConfig field (with 15s default) would allow tuning without a code change. That said, this is a sanity cap rather than a normal-operation limit, so a hardcoded constant is defensible.
  • Test coverage gap. There are no new or updated tests verifying behavior at the boundary (e.g., that a hung callback times out at ~15s and proceeds with empty deltas). If this was prompted by a real regression, a test that reproduces the scenario would be valuable.

Overall

The change is safe and well-scoped. The main gap is the empty PR description — the motivation and any related incidents should be documented so future maintainers understand why 15s was chosen over 30s (or any other value).

@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/01-core-remove-prevent-sleep branch from 96f71c0 to c43b558 Compare April 24, 2026 10:32
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/02-serialize-state-timeout-15s branch from f32030d to 96f22d0 Compare April 24, 2026 10:32
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/01-core-remove-prevent-sleep branch from c43b558 to f4dbc84 Compare April 24, 2026 11:48
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/02-serialize-state-timeout-15s branch from 96f22d0 to cfc540f Compare April 24, 2026 11:48
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/01-core-remove-prevent-sleep branch from f4dbc84 to b3b1cf8 Compare April 24, 2026 12:14
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/02-serialize-state-timeout-15s branch 2 times, most recently from 0c5605c to ed50b2a Compare April 24, 2026 12:32
@NathanFlurry NathanFlurry changed the base branch from sleep-cleanup/01-core-remove-prevent-sleep to graphite-base/4742 April 24, 2026 12:59
@NathanFlurry NathanFlurry force-pushed the sleep-cleanup/02-serialize-state-timeout-15s branch from ed50b2a to 9ba012c Compare April 24, 2026 13:16
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4742 to sleep-cleanup/01-core-remove-prevent-sleep April 24, 2026 13:16
Base automatically changed from sleep-cleanup/01-core-remove-prevent-sleep to main April 27, 2026 07:13
@NathanFlurry NathanFlurry merged commit cb87b8f into main Apr 27, 2026
18 of 22 checks passed
@NathanFlurry NathanFlurry deleted the sleep-cleanup/02-serialize-state-timeout-15s branch April 27, 2026 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant